Improving Twitter Retrieval by Exploiting Structural Information
نویسندگان
چکیده
Most Twitter search systems generally treat a tweet as a plain text when modeling relevance. However, a series of conventions allows users to tweet in structural ways using combination of different blocks of texts. These blocks include plain texts, hashtags, links, mentions, etc. Each block encodes a variety of communicative intent and sequence of these blocks captures changing discourse. Previous work shows that exploiting the structural information can improve the structured document (e.g., web pages) retrieval. In this paper we utilize the structure of tweets, induced by these blocks, for Twitter retrieval. A set of features, derived from the blocks of text and their combinations, is used into a learning-to-rank scenario. We show that structuring tweets can achieve state-of-the-art performance. Our approach does not rely upon social media features, but when we do add this additional information, performance improves significantly.
منابع مشابه
Web Semantics: Science, Services and Agents on the World Wide Web
We propose the application of a novel sub-ontology extraction methodology for achieving interoperability and improving the semantic validity of information retrieval in the medical information systems (MIS) domain. The system offers advanced profiling of a user’s field of specialization by exploiting the concept of sub-ontology extraction, i.e., each sub-ontology may subsequently represent a pa...
متن کاملA High-Performance Model based on Ensembles for Twitter Sentiment Classification
Background and Objectives: Twitter Sentiment Classification is one of the most popular fields in information retrieval and text mining. Millions of people of the world intensity use social networks like Twitter. It supports users to publish tweets to tell what they are thinking about topics. There are numerous web sites built on the Internet presenting Twitter. The user can enter a sentiment ta...
متن کاملImproving Twitter Sentiment Classification via Multi-Level Sentiment-Enriched Word Embeddings
Most of existing work learn sentiment-specific word representation for improving Twitter sentiment classification, which encoded both n-gram and distant supervised tweet sentiment information in learning process. They assume all words within a tweet have the same sentiment polarity as the whole tweet, which ignores the word its own sentiment polarity. To address this problem, we propose to lear...
متن کاملExploiting Topical Perceptions over Multi-Lingual Text for Hashtag Suggestion on Twitter
Microblogging websites, such as Twitter, provide seemingly endless amount of textual information on a wide variety of topics generated by a large number of users. Microblog posts, or tweets in Twitter, are often written in an informal manner using multi-lingual styles. Ignoring informal styles or multiple languages can hamper the usefulness of microblogging mining applications. In this paper, w...
متن کاملExploiting Neural Embeddings for Social Media Data Analysis
In this paper, we describe our microblog realtime filtering system developed and submitted for the Text Retrieval Conference (TREC 2015) microblog track. We submitted six runs for two tasks related to real-time filtering by using various Information Retrieval (IR), and Machine Learning (ML) techniques to analyze the Twitter sample live stream and match relevant tweets corresponding to specific ...
متن کامل